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Dedication To the memory of Daniel Shanks (1917-1996). 
Abstract 

In this partly expository paper, we prove two results. 

• That the two-sided continued fraction of the normalized square root 
(an important part of the SQUFOF algorithm) has several very at- 
tractive properties - periodicity, a symmetry point corresponding to 
a factorization of N, and so on. 

• The infrastructure distance formula. 

Finally, we describe a method for parallelization of SQUFOF that main- 
tains its efficiency per procesor as the number of processors increases, and 
thus is predicted to be useful for very large numbers of processors. 

1 Introduction 

Though there are many fast algorithms for factoring numbers, this paper focuses on 
one known as square forms factorization or SQUFOF (see Algorithm 2] below for a 
precise description). Daniel Shanks developed SQUFOF in the 1970's, and it is still the 
fastest known algorithm for factoring integers in the 20- to 30-digit range. SQUFOF is 
used to this day in conjunction with other factorization algorithms that need to factor 
20-digit numbers in order to generate the factors of higher digit numbers. 

Most of the Shanks' original work on SQUFOF was not published (see however, 
Shi ) and his notes are incomplete 1 . One purpose of this paper is to present Shanks's 
original SQUFOF algorithm in its entirety. The paper goes on to present several 
results concerning both traditional SQUFOF and its parallelization. 

This paper contains three main results: 

1. A proof that the two-sided continued fraction of the normalized square root (an 
important part of the SQUFOF algorithm) has several very attractive properties 
- periodicity, a symmetry point corresponding to a factorization of N, and so 
on (see Theorems 12. 61 |2~S1 and 12. 91 for details). This result was probably known 
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200 years ago to Lagrange and Galois and Gauss - see for example, Perron 
Buell |Bu], and Williams |WJ. 

2. A proof of the infrastructure distance formula, Theorem 12.101 below, which is 
also an important part of SQUFOF. This is in some sense well-known but a 
proof has not, as far as we can see, appeared in the literature. (However, see 
Cohen |Coh2| . Proposition 5.8.4, and Williams and Wunderlich IWW| Theorem 
5.2 for closely related results.) 

3. Investigation of a method for parallelization of SQUFOF that maintains its effi- 
ciency per procesor as the number of processors increases, and thus is predicted 
to be useful for large numbers of processors. (See |W|. [WW], and |Go| for work 
on similar ideas.) The implementation in C, and subsequent numerical data due 
to the first author, is new as far as we know. This is briefly sketched in §4 below. 

Although the theoretical results in this paper are known to the experts, it is hoped 
that putting all these results together will serve a useful purpose. This paper is a 
version of the first author's undergraduate "Trident" thesis, advised by the second 
two authors. 



2 Continued Fractions and Quadratic Forms 



The stepping stone for SQUFOF is the continued fraction expansion for the square 
root of N. (We slightly simplify matters by instead using the "normalized square 
root (equation 0} here.) The terms of this continued fraction expansion give rise to 
a sequence of quadratic forms of discriminant N via We shall describe SQUFOF 
in terms of the "cycle" of continued fractions in the periodic expansion of @ and the 
corresponding quadratic forms. 

2.1 Integral binary quadratic forms 

There is a "dictionary" between certain aspects of 

• indefinite integral binary quadratic forms, 

• ideals in a real quadratic number field, 

• the simple continued fraction of quadratic surds. 

The reader will be assumed to be familiar with at least the basic aspects of this 
correspondence. For details, see for example, Buell [Bu| . Lenstra |Len| . Williams |W| 
(especially pp. 641-645), Cohen |Cohl| and the references found there, or |M|. 

A binary quadratic form (or simply a "form" ) is a homogeneous form of degree 
two in two variables x, y, 



for some constants a,b,c. This form shall also be denoted by the triple (a, b, c). The 
discriminant 2 of / is D — disc(f) — b 2 — 4ac. We shall focus on the case D > 0, 
in which case the form is called indefinite. From now on, we assume without further 
mention that D > is a non-square such that D = (mod 4) or D = 1 (mod 4). 

2 Sometimes also called the determinant of /. 
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If a, 6, c G Z then we say / is integral. If moreover gcd(a,b,c) = 1, then we say 
the form is primitive. Let F(D) denote the set of all integral forms of discriminant 
D and let F(D) P denote the subset of primitive ones. 

The groups 

GL 2 (Z) = { 7 =(* | s,t,u,v<EZ, det( 7 ) = ±l}, 

and 

SL 2 (Z) = {76 GL 2 (Z) I det( 7 ) = 1} 
act on the polynomials Z[x,y] via 

7 = ( * * J : (a;, y) 1 — > (sx + ty, ux + vy). 
Therefore, they also act on the set of integral forms via 



for 7 G GL2 (Z) . In terms of the symmetric matrix A = I , a . ^/^ I associated to 



(7*7)0,2/) = fi sx + ty,ux + vy), 

6/2 c 

the form /, this action may be epressed as 

7* (A) = 7 • A ■ 7. 

We say that two forms /1, / 2 are equivalent if / 2 = 7*/i, for some 7 £ GL 2 (Z). We 
say that two forms /1, / 2 are properly equivalent, written f\ ~ / 2 , if / 2 = 7*/i, for 
some 7 G SL 2 (Z). For / G F(£>), we let 

F{D)f = [f] = {/' G F(D) I / ~ /'} 

denote the proper equivalence class of /. An element 7 G GL 2 (Z) is called an au- 
tomorph of / if 7*/ = /. A form / is called ambiguous if it has an automorph 
in GL 2 (Z) — 5'I/ 2 (Z). Note that if / G F(D) is ambiguous then each /' G [/] is also 
ambiguous. 

We say that two forms (ai, 61, ci), (a 2 , 6 2 , c 2 ) G F(D) are adjacent if ci = a 2 
and 61 + 6 2 = (mod 2a 2 ). In this case, we say that (a 2 ,6 2 ,c 2 ) is to the right of 
(ai, 61, ci) ((ai, 61, ci) is to the left of (a 2 , 6 2 , c 2 )). 



2.1.1 Reduction 

A form (a,6,c) is called reduced if \D 1/2 - 2\a\\ < b < D 1/2 . Let F(D) r denote the 
subset of reduced forms of discriminant D. 

Lemma 2.1. (a) Given any f G F(D) r there is a unique /' G F(D) r adjacent to the 
right of f and a unique f" G F(D) r adjacent to the left of f. 

(b ) There are exactly two reduced ambiguous forms in a cycle of reduced forms in 
an ambiguous class. 

For (a) see Buell Bu , page 23; for (b), see IBul . Theorem 9.12. Lemma [2.11 allows 
us to define the cycle of reduced forms associated to / G F(D) r : it is the set of all 
/' G F(D) r which is adjacent to the left or right of /. This cycle is denoted F(D) r j. 
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Lemma 2.2. An ambiguous equivalence class contains two points of symmetry, that 
is, pairs of reduced adjacent forms, (c,b,a) to the left of(a,b,c), in the cycle that are 
the symmetric reverse of each other. In that case, either a divides the determinant, or 
a/2 divides the determinant. 

This follows from Theorem 12.91 below. 

It is evident that if a form is ambiguous, then each form in its equivalence class is 
also ambiguous. 

Proposition 2.3. The set F(D) r of reduced forms can be partitioned into cycles of 
adjacent forms. 

Consider the action of 



on a form (a,b,c): T m (a,b,c) 
This defines a map T m : F(D) 
Consider the action of 



on a form (a,b,c): W(a,b,c) = (a',b',c'), where a' — c, b' = —b, c' — a. This defines 
a map W : F(D) -> F(D). 

Algorithm 1. (Reduction) 
Input: / G F(D). 
Output: /' G F(D) r with / ~ /'. 
Let f(x, y) = ax 2 + bxy + cy 2 and let 



1 m 
1 



Fjji - 

\ u i j 

! _ {b') 2 -D 



(a' , b' , c'), where a' = a, b' — b + 2am, d = - 4a , 
-> F(D), for each m G Z. 



W = 



-1 

1 n 



J a ,D ={x \ - \a\ < x < \a\, if |o| > D 1/2 , -2\a\ < x < D 1/2 , if \a\ < D 1/2 }. 

1 . Apply T m to (a, b, c) to obtain a form (a, b', c'), where b' g J a .D and c' is chosen 

_ so, thai the .new form has discriminants. 2 ,. , - _ . 

2. It (a,b ,c J is reduced then return j (x,y) = ok +6 xy + cy . Otherwise, replace 

(a, 6', c') by V7(a, b', c') = (c , -b' ' , a) and go to step 1 . 

According to Lagarias |L1| . this has complexity 0(log(max(\a\, \b\, |c|))). 
Define the adjacency map p : F(D) — ► F(D) by 

p(a,b,c) = (a',b',c), (1) 

where a' = c, b' G J c ,d, and 6' = —6 (mod 2c), and c' is determined by the condition 
disc(a',b',c') = D. This defines a bijection p : F(D) r -» F(D) r . 

Unfortunately, given / G F(D) with D > there are usually several /' G F(D) r 
which are properly equivalent to /. In other words, the cycle 

F(D) rJ = {/' G F(D) r | / ~ /'} = {/' = p n f | n G Z} 

can be rather large. Indeed, it is known that \F(D) r j\ = 0(D 1 ^ 2+e ), (where the O- 
constant depends on e) for all e > 0, where the exponent 1/2 is best possible (Lagarias 
|LeiillLl) ') and . 
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2.1.2 Composition 



The composition of forms has important properties for SQUFOF. The rules of com- 
position are fairly general. A binary quadratic form F is called a composition of 
/, g G F(D) if it satisfies an equation such as 

f{x,y)g{u,v) = F(Bi(x,y,u,v),B 2 (x,y,u,v)), (2) 

where B\ and B2 are quadratic forms in x,y,u,v of a certain type. The exact con- 
ditions Bi , B2 satisfy do not concern us here (see Cox |Uox| if you are curious and 
Gauss [H] if you are really curious). The point is that there may be more than one 
pair B\, B2 satisfying @, so that the composition F is not unique. (However, the con- 
ditions on B\,B2 specified by Gauss do imply that, for a given f,g £ F(D) any two 
such compositions must be equivalent to each other.) One way around this ambiguity 
is to specify a choice of B\ , B2 and hence define F uniquely. 

The idea described below was known in some form to Dirichlet and possibly Gauss. 

Algorithm 2. Input: (oi,6i,ci), (a 2 ,&2,c 2 ) G F(D). 
Output: A composition B, (B \l°^ ) e F{D). 

1 . Compute m = gcd(a ly a 2 , bl 1 2 b2 ). (Since D = b\ - -iaid, for z = 1,2, 61 and b 2 

have the same parity.) 

2. Solve the congruences 

a,2vnB = mb\a,2 (mod 201O2), 
a\mB = m&2«i (mod 20102), 
b -^mB = m blb l +D (mod 2a lH2 ), 

simultaneously an integer B. Choose the solution with smallest absolute value. 

See |Shlj or |Bu| for a proof of the correctness of this algorithm. Buell |Bu| also 
provides the substitutions that would be needed for Gauss's definition of composition. 

In other words, we define the composition of (01, 61, ci), (02, &2, C2) G F(D) to be 
the form resulting from the above algorithm: 

, , \ / t , t oia 2 (B 2 - D)m 2 

(ai,bi,ci) * (a2,b2,C2) = ( — tt,B, ). 

m, z 4aia2 

Remark. The binary operation * : F(D) x F(D) — > F(D) is associative but not 
its "restriction" # : F(D) r x F(D) r — > F(D) r (where # is composition algorithm |5| 
followed by reduction algorithm 0. 

Let /, g G F(D) r be elements in the principal cycle of discriminant D. It was 
observed by Shanks (see §5 in Lenstra |Lenp that cycles enjoy a "coset-like property" 
p k f$ i p e g = p°" k l for some G 1 . In particular, the principal cycle is closed 

under composition. Therefore, the the set of complete quotients of the continued 
fraction of such an a can be identified with a set closed under 

For further discussion of this, see Lenstra |Len| (5.1). 

The "structure" of a cycle has been termed the "infrastructure" of F(D) by Shanks. 

H /i /', 3i <?') h £ F(D) then Gauss showed 

(a) (/ * g) * h ~ / * (g * h), and 

(b) / ~ /' and g ~ g imply f * g ~ f * g' . 
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These imply that the set of equivalence classes of forms of discriminant D is a group 
C(D), called the class group of D. From the construction, it is clear that f*g ~ <?*/, 
so C(D) is abelian. 

The following Theorem was known to Shanks, since SQUFOF depends essentially 
on it. 

Theorem 2.4. An equivalence class has order 2 or 1 in the class group if and only if 
it is ambiguous. 

Any form (1,6, c) € F(D) acts as the identity for *. The cycle of the identity is 
the principal cycle of forms. Any form / whose square f 2 = / * / belongs to the 
principal cycle is an ambiguous form ( |Bu| . Corollary 4.9). 



2.2 Continued fractions 

Throughout, assume that N = 1 (mod 4) and is not a perfect square. 

We shall only consider simple continued fractions here. In other words, if a G R 
is the number we want to compute the continued fraction of, let xo — a, bo — \_ct\ , 
where [x\ denotes the floor of x, and, for i > 0, let 

bi = [ Xi \ . (3) 



The term Xi is called the i th complete quotient of a and bi is called the i th partial 
quotient of a. The simple continued fraction of a is ( HW ): 

1 



a = b 



6i 



&2+— 

also written [bo, 6i, 62, ...]. We are only concerned with continued fractions of an irra- 
tional a £ K = Q(\/iV). In this case, the sequence 60, 61, 62, ... is eventually periodic. 
For example, let 
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[ ^j^' , \vw\ odd. (4) 

We call this a the normalized square root of TV. The continued fraction sequence 
60,61,... is (purely) periodic. In general, the period of a is the size of the cycle 
associated to the identity in the class group (Buell |Bu| . Theorem 3.18 (a)). 

At each step in the continued fraction expansion, it is possible to simplify Xi — bi 
to the form ^Q Fi S [0, 1), where Pi,Qi £ 1 satisfy Pf = TV (mod Qi). In general, 
if P, Q are positive integers and x = v/ ^ +p satisfies P 2 = N (mod Q), < P < s/N, 
l^/N — Q\ < P, then we say that x is reduced. It is known that if x, y are two such 
reduced numbers and y = 7(2;) (where 7 = [ ^ ] (E SLi(li) acts on R = RU {00} 



d t 

by j(x) — ff^) then y occurs in the simple continued fraction expansion of 1 as a 
complete quotient (and x occurs in the simple continued fraction expansion of y as a 
complete quotient). See Buell |Bu| . Proposition 3.20 for a proof. 

If P, Q are positive integers and x = ^- j +p then we associate to x the quadratic 
forms 

P 2 - N P 2 - N 

f. = (-Q/2,P, ^-), f + = {Q/2,P 1 t-^L), (5) 



G 



which have discriminant N. (We implicitly assume here that P 2 q N £ Z and Q is even. 
Note that if a: is reduced then so are f±, and conversely.) 

Lemma 2.5. (H. Cohen §5.7.1) The continued fraction expansion of the 

quadratic irrational corresponding to the unit reduced form is not only periodic but 
symmetric. 

What is the continued fraction analog of "adjacency" of forms? Applying the 
adjacency map Q is roughly analogous to the "stepping" process of going from one 
complete quotient to the next in a continued fraction. See Williams §5 for a discussion 
of the the ideal-theoretic analog, at least for the case of the simple continued fraction 
of =l±s^. 

One tool used by many different algorithms is the continued fraction expression for 
where N is the number to be factored. This expression is calculated recursively: 
xq — a, bo — [xq\ , and using in general. Observe that solving equation © for 
Xi-i gives Xi-\ = bi-i + 

The recursive formulas are, for i > 0, 

Xi+i = — rr- 



■/N+Pj ,„v 

6 *+i + o i+1 > 



bi = [x t \. 

Theorem 12.61 provides some well-known fundamental properties and identities of con- 
tinued fractions. 

Theorem 2.6. (JT$) 

In the continued fraction expansion of with xo = a, each Xi reduces to the 

form '^g" P '~ 1 , with unique Qi,Pi G Z satisfying 
(a) N = Pl + QiQ i+1 , 



(b) Pi = hQi-Pi-i, 



(c) h = [ 



> 1, 



(d) < Pi < VN, 

(e) IvOV-Qil < Pi^, 

(f) Qi is an integer, 

(g) = Qi-i + bi{Pi-i - Pi). 

(h) This sequence is eventually periodic. 

These denominators {Qi} will be referred to as pseudo-squares. (Indeed, for i > 
0, if we write [&o, &i, ...&»] = §- then A^-Bj^N = (-1) 4 Q 4 and so A^i = (-l)*<3i 
(mod N).) 
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Remark. The fact that each Xi reduces to the form g 1 1 is important for compu- 
tational efficiency because this together with (c) imply that floating point arithmetic is 
not necessary for any of these calculations. Also, by use of (b) and (g), the arithmetic 
used in this recursion is on integers < 2y/N. 

Since the continued fraction is eventually periodic, it is reasonable to consider 
that when it loops around on itself, the terms being considered may have come from 
some terms "earlier" in the recursion. Lemma |2 . 71 shows that by exchanging these two 
related expressions, the direction is reversed. The algorithm for stepping a continued 
fraction expansion in the opposite direction will be precisely the same as the one for 
the forward direction, except that the numerator is changed first. Note that this same 
change (with the exception of Co) could be achieved by merely changing the sign of 
Pi-i- 

Lemma 2.7. Let N, and, for i > 0, let Xi,bi, Pi,Qi be as in Theorem \2. 61 Let 
yo = ^t+r 1 and let c ° = LyoJ ■ If we define, forj > 1, Vj = y ._ 1 \ c ._ 1 , Cj-i = [yj-i] 
then Co = bi+i and yj = V ^ +P '^ +1 , when < j < i. 

Using Lemma 12.71 to go backwards in the continued fraction expansion, denote the 
terms before xq as X—\, x_2, .... The sequence {xi \ i G Z} will be called the two-sided 
continued fraction of xo- Define Q-i and P-i similarly, i > 0. 

Theorem 2.8. (a) With these conventions on the negative indices, Theorem \2. 61 av- 

plies for all i £ Z. 

(b) Define Xi as in Theorern \2. 61 i£Z. There exists a positive integer it such that 
for all i £ Z, Xi = Xi+n- 

(c) Let xo — a such that Qo \ 2P_i (as in equation The sequence of pseudo- 
squares is symmetric about Qo, so that for all i £ Z, Qi = Q-i. 

This follows easily from the lemma above so the proof is omitted. 

This demonstrates an important fact about continued fractions: that the direction 
of the sequences of pseudo-squares and residues can be reversed (i.e. the indices 
decrease) by making a slight change and applying the same recursive mechanism. The 
presence of one point of symmetry allows a proof that another point of symmetry 
exists and that a factorization of iV may be obtained from this symmetry 3 : 

Theorem 2.9. Let s = |_f J, where ir is the period from Theorem \2.fi[ If n is even 
then (a) Q s+l = Q a - % , (b) Q 3 / Q , (c) P a = and (d) Q s \ 2N , for all i £ Z. If 

7r is odd then, for all i G Z, 

• Qs+i+i = Qs-i, and 

• either (a) gcd(Q s , iV) is a nontrivial factor of N , or (b) — 1 is a quadratic residue 
ofN. 

The argument for the first statement is in |W| . pages 641-642. For an elementary 
proof of both statements, see |M] , 

3 This was actually discovered in the opposite order. It was clear that ambiguous forms 
that met this criteria provided a factorization but was later realized that these same forms 
produced symmetry points. This was first noticed by Gauss |H] and first applied by Shanks 
I5h4l . 
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2.3 Infrastructure distance formula 



For m < n, and for {xt}i S z, the terms in the continued fraction in (|SJ, Shanks defined 
infrastructure distance by 

D(x m ,x n ) = log( f[ x k J ■ (7) 

\fc = m + l / 

We abuse notation and write D(F m , F„) as well for this quantity, where a form F 
corresponds to a term x in the continued fraction via the map x i — > /+ @. Lenstra 
|Len| adds a term of i log(Q n /Q m ) to this (where Q denotes the pseudo-square term 
of x), with the effect that the resulting formulas are slightly simplified but the proofs 
are more complicated and less intuitive. Definition Q is also used by Williams in |W| . 

Since the quadratic forms are cyclic, in order for the distance between two forms 
to be measured consistently, it must be considered modulo the distance around the 
principal cycle. 

Definition. Let ir be the period of the principal cycle. The regulator R of the class 
group is the distance around the principal cycle, that is, R — D(Fo,F- K ). 

Therefore, distance must be considered modulo R, so that D is a map from pairs 
of forms to the interval [0, R) C R. The addition of two distances must be reduced 
modulo R as necessary. 

Theorem 2.10. (infrastructure distance formula) If Fi ~ Ft are equivalent forms 
and Gi ~ Ge are equivalent forms and D Pt \ is the reduction distance for Fi * Gi and 
D p ,2 is the reduction distance for Ft * Ge and mi and mt are the factors cancelled in 
each respective composition (Algorithm^, then 



D(F!#Gi, F k #Ge) = D(F u F k ) + D(Gi,G t ) + D p , 2 - + log(m 2 /mi) 

proof: Here is a sketch. (For more details, see Theorem A. 5. 2 in \M\.} 
As each quadratic form is associated with a reduced lattice, an analysis of distance 
requires a connection between reduced lattices (see §3 of |W| for the definition of 
reduced lattice) . We use the notation of Williams [ without further mention. 

If C denotes lattice in Q(VjV), let L{C) denote the least positive integer contained 
in it. 

Lemma 2.11. (Lemma A. 4-2 of JMjl) Let I be a primitive ideal and let C denote the 
lattice corresponding to F If CJ is a lattice with basis and for some 6, 6C' = C, 

then the ideal J corresponding to the lattice C! is a primitive ideal and 

(L(I)6)J = (L(J))I (8) 

The method of Voronoi (see for example [W|1 is used to obtain a sequence of 
adjacent minima, corresponding to a sequence of reduced lattices. Consider a sequence 
of lattices Ci, €2,--- corresponding to ideals Ki,K2, ••■ corresponding to binary 
quadratic forms Fi, F2, • • • , corresponding to terms xi,X2, • • ■ in a continued fraction 
expansion JSJ. If, for two adjacent lattices in the sequence, is defined by d+i = 
1/^iJCi, then the chain of adjacent minima of £1 are defined by 6 k — YliZi so 
9 k Ck = jCi (see |W|. §3). Distance between such lattices is then defined by 

D(£ k ,Ce) = \og(e k /6 e ) (9) 
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and this definition of distance corresponds exactly to the definition given for quadratic 
forms (see §6). 

Although this definition has so far only been applied to reduced ideals (for the 
definition of reduced ideal, see for example WJ §2) and lattices, the reduction of ideals 
and lattices corresponding to quadratic form and continued fraction reduction is well 
known: 

Lemma 2.12. (Lemma A. 5.1 in JMjl) Let I be any primitive ideal in ZjyiV]. There 
exists a reduced ideal Ik and a Ok & I such that (L(I)9k)I n = (L(Ik))I ■ 

Here 6k may be efficiently computed by Voronoi's method or by continued fractions. 

Then the reduction distance is defined by D p = — log(#fc) and may be considered as 

the distance from / to Ik- 
Let Ii denote the ideal corresponding to the form Fi in the usual way (as in |Len| ~). 

let Ji be the ideal corresponding to Gi, and let K\ denote the ideal corresponding 

to F\ *G\. We have that (s)Ki = I\J\, for some s. Let Kj be a reduced ideal and 

A G Ki such that 

AA'j = Ki. (10) 

Then Kj is the ideal corresponding to Fi#Gi. 

Similarly, let Ik denote the ideal corresponding to the quadratic form Fk and Je 
be the ideal corresponding to the form Gi. If Hi denotes the ideal corresponding to 
the composition Fk * Gi, then (t)Hi = Ik Je, for some t. Let H be a reduced ideal and 
choose r) G Hi such that r\H = Hi. Then H corresponds to Fk#Gt. 

Let fi and cf> be such that filk = Ii and 4>Ji — Ji. Combining these equations, 
gives 

Kj = Kl/X = hJl/Xs = M )IkJl = ( ^ )Hl = { ^R )H , 

As At At 

Set i> = ^2 an d then ^H = Kj, so that by ©, 



D(Kj,H) = -log(i>) = -login) - log(0) - Log(»j) + log(X) - log(s/t) 

= D{Ii,Ik) + n(Ji,J t )+D(Hi,Hj)- D{Ki,K 3 )+\o g {t/s), 
as desired. □ 

Remark. Shanks stated Square Forms Factorization has an expected runtime of 
0(\fN) (see Gower |Go) for a detailed discussion of this). 

We explain a related idea remarked on by H. Lenstra |Len| . page 148. 

The idea is to first compute the regulator R. This has complexity 0(N~s +e ), 
assuming the Riemann hypothesis |Len| . Now use the "baby-step giant-step" method 
(as discussed in §13 of |Len| ) to get close to the symmetry point: 

Algorithm 3. (Baby-step giant-step) 

Input: N and R 

Output: Factorization of N 

1 . Compute the form F associated to the first or second steps of the continued 
fraction algorithm of the normalized square root of N, 0. 
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2. while F is not within R/4 of the symmetry point (where distance is judged using 
the distance formula in Theorem l2.10> . 

(a) Store F in a Collection F c 

(b) F = F#F (These are the ''giant-steps") 

3. Use the intermediate forms in F c to compose with F until within log AT of the 

4. U^mglfie 7 forward and backward steps (see Theorem l2~8l of the continued frac- 

5. u^'ggr^i^ P0int 

StepsO El and0] each take 0(log JV), so that the factorization takes 0(iVi +e ). 



3 SQUFOF 

Formally, here is the algorithm for factoring N: 

Algorithm 4. (SQUFOF) 
Input: N. 

Output: A factor of N 



while Q z f= perfect square for some i even 

w * - L^J 

(b) P <- hQi - Pi-i 
c Qi+i <- Qi-i + bi(Pi-i - Pi) 
(d) if i = 2 n for some; n Store (Q,, 2 • Pi) 

bmpose % witrT"sfore6Pforms according to the binary representation of i/2 and 
;ore result \p,F . 

l^ ( M7Pol-rP/2,Q lA - \C\ 

a) Apply same recursive formulas to (Q a ,P ,Qi) and (q ,Po, qi) 
1 Q. f Pi = Pi_i, either Qi or QJ.2 is a nontrivial factor, of r iV. 
1 T. If pi = pi- lt either g? or ^/zis a nontrivial factor of TV. 




3.1 Proof 

Let iV, the number to be factored, not be a perfect square. Expanding the continued 
fraction for yN, let Q be the first square pseudo-square found on an even index. 
Let r = y/Q. Let F — (r 2 ,b,c) be the associated quadratic form. Then (r,b,rc), 
which reduces with reduction distance D p — to G = (r, b' , c') is a reduced quadratic 
form whose square is F. Therefore, by Theorem 12.41 G is ambiguous and thus has a 
symmetry point in its cycle. 

Since by Theorem EHul 2D(G S ,G) = D(F S ,F) (mod R) where F s is the symmetry 
point of the principal cycle with coefficient 1, D(G S ,G) = D(F S , F)/2 (mod R/2). 
Since the two points of symmetry are P/2 away from each other, this means that 
there is a symmetry point at distance D(F a , F)/2 behind G. Therefore, a point of 
symmetry may be found by reversing G and traveling this short distance. Now if the 
coefficient at this symmetry point is ±1, then there would have been a pseudo-square 
in the continued fraction expansion equal to r somewhere before F. If the coefficient 
is 2, then this symmetry point could be composed with G to find 2r at an earlier point 
in the principal cycle. Therefore, if neither r nor 2r were encountered before F in the 
continued fraction expansion, then the symmetry point provides a nontrivial factor for 
N. 
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4 Parallel SQUFOF 



With the large amount of computation required for factorization, the efficiency of a 
parallel implementation is especially important for factorization algorithms (see Brent 
|B7) for a survey and some terminology). 

There have been proposed two ways to parallelize SQUFOF: using multipliers and 
using segments. We will discuss the segments method here. More information on the 
multipliers method can be found in Gower |Go| . 

4.1 Segments 

The segments technique depends upon the ability to use composition to jump to arbi- 
trary locations in the principal cycle. The cycle can be divided into multiple equal-sized 
sub-sequences and each sub-sequence can be searched by one of the processors. As re- 
cently as ANTS 2004, Pomerance suggested investigating parallel SQUFOF (personal 
communication; see also |W| page 645). 

When factoring using SQUFOF parallelized by segments, we choose a quadratic 
form G several steps into the cycle and then square it several times (how many times 
is more an art than a science - it depends on the number of processors and their 
speed and wanting to have segments which finish fast but not too fast, say 20-30 in 
our case). Call the resulting form F. For i > 1, each F M is assigned to processor i 
as a beginning of another segment, [F 2i , p(F 2z ), p 2 (F 2i ), .., F 2z+2 ], where p is the 
adjacency map. When processor i finds a pseudo-square which is a perfect square, 
that form H may used to find the symmetry point as follows (Note H — p 2n {F 21 ), for 
some n). First, take the square root of H and reverse it, call this H' . This is in a new 
cycle of quadratic forms. Next, compose H' with F l , call it H" . Finally, compose H" 
with powers of G to bring it closer to the symmetry point. 

Algorithm 5. (Segment-based Parallel SQUFOF) 
Input: N 

Output: A factor of N 




5. F 



(a) Fi <- * 

1 <— Ft 



Proi 





■'. while Afactor is not found and F start / F end 

(a) Cycle F sta rt forward 2 steps. 

(b) count <— count+1 

(c) if A is a perfect square 



i Ft t <- F~ ' 

- J- test * start 
II- Ftest <— Ftest * FrootS 

iii. for j = size to 1 (This loop composes F test with the necessary 
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A. if count > 2 j forms to bring it close to the symmetry point.) 

B. Ftest <— Ftest * Fj 

C. count <- count -2 J 

D. Search in both directions from Ftest for a symmetry point. 

E. if Factorization found at symmetry point, output and quit. 

5. if A factor is still not found, receive new F at avt, F end , and F r0 ots and start over. 

Since there is no overlap between the segments searched by the processors and 
since the perfect squares appear to be distributed evenly throughout the principal 
cycles, this parallelization should be efficient for any number of processors. There are 
two hazards when choosing selecting the size of the segment. If the segment size is too 
small, the processors will finish their segments so quickly that receiving new segments 
will become a bottleneck. Alternately, if the segments are too long, the processors 
may divide up more than the entire cycle, so that there is overlap. However, except 
for rare numbers that will factor fast regardless, there is significant room in between 
these two bounds. 

Remark. The segments based parallelization described here has been implemented in 
C using MPI and run on a 64 processor SGI Origin 2800. Detailed results and compar- 
isons to the multipliers method can be found in McMath [M]. Initial results indicate 
that the segments method does indeed continue to be efficient when the number of 
processors is increased. 

The parallelization of SQUFOF by segments involves exactly the same formulas 
as the parallelization of the continued fraction factoring algorithm. This was done 
in 1987 by Williams andWunderlich [8]. Algorithm 5 of the manuscript is same as 
Algorithm 4 of [8] , although the former is couched in terms of binary quadratic forms 
while the latter uses continued fractions. The equivalence between binary quadratic 
forms and continued fractions is well known. 

In order to evaluate the efficiency of the segment-based parallelization, we imple- 
mented it and compared it empirically to an implementation of the multiplier-based 
version. The test integers were all products of randomly chosen primes of roughly 
equal size. Primes of size 80 bit, 100 bit and 120 bit were all tested on 20, 30, 40, and 
50 processors. This allows an analysis of both how each algorithm is affected by the 
size of the integers and how efficiently each algorithm uses an increasing number of 
processors. 

4.2 Multipliers 

In 1982, D. Shanks and H. Cohen attempted a parallelization by having multiple 
processors attempt to factor N, 3N, 5N, etc. Gower's recent Ph.D. thesis (under 
S. Wagstaff) |Go| analyzed the use of multipliers and found them to be effective in 
general but didn't provide much evidence on their efficiency for parallelization. 

The multipliers technique of Gowers- Wagstaff involves generating multiple version 
of the factorization algorithm by multiplying N by products of small square-free num- 
bers ki. Each product yields a new number Mi which can be factored on a single 
processor of a parallel machine. If processor i discovers a factor of Mi that is not 
from ki, then a factor of N has been found. Parallel SQUFOF using multipliers was 
considered by Shanks and H. Cohen (when Cohen visited Shanks at the University of 
Maryland in 1983, mentioned to the second author in a private conversation), men- 
tioned by Williams (|W|. page 645, as an interesting line of research), and S. Wagstaff 
and his students (most recently J. Gower |Gop . 
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A quick survey of our data for the average runtime shows that for the segments 
parallelization, the runtime was cut in half from 20 processors to 50, while the mul- 
tipliers implementation didn't do quite so well. The data indicates that the efficient 
use of multiple processors for the segments parallelization is roughly unaffected by 
increasing the number of processors, while the multipliers parallelization is less effi- 
cient at using a larger number of processors. This is the expected result. As Jason 
Gower demonstrated in |Go| . the use of a multiplier can decrease the runtime by an 
average of 27%. Therefore, for small numbers of processors, using multipliers should 
immediately cut the runtime down. However, for larger numbers of processors, the 
multipliers available aren't used as efficiently. 

Although the data isn't completely clear, the trend is toward segments being faster 
than multipliers if enough processors are used. Based on the averages, a linear regres- 
sion predicts a crossover at 80 processors and a quadratic regression predicts a crossover 
at 47 processors. The correct answer is probably somewhere within that range, but 
even with extensive testing, it would be hard to pin down the crossover exactly due 
to the large standard deviations arising in the data. 

5 Conclusion 

This paper, aside from presenting SQUFOF in its entirety for the first time, has shown 
that the algorithm can be presented in terms of an elegent theoretical framework using 
two-sided continued fractions and class groups of quadratic forms over a real quadratic 
field. It further proved the infrastructure distance formula on the cycle of forms in the 
class group. 
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